14 research outputs found

    Anaphora resolution for bengali: An experiment with domain adaptation

    Get PDF
    In this paper we present our first attempt on anaphora resolution for a resource poor language, namely Bengali. We address the issue of adapting a state-of-the-art system, BART, which was originally developed for English. Overall performance of co-reference resolution greatly depends on the high accurate mention detectors. We develop a number of models based on the heuristics used as well as on the particular machine learning employed. Thereafter we perform a series of experiments for adapting BART for Bengali. Our evaluation shows, a language-dependant system (designed primarily for English) can achieve a good performance level when re-trained and tested on a new language with proper subsets of features. The system produces the recall, precision and F-measure values of 56.00%, 46.50% and 50.80%, respectively. The contribution of this work is two-fold, viz. (i). attempt to build a machine learning based anaphora resolution system for a resource-poor Indian language; and (ii). domain adaptation of a state-of-the-art English co-reference resolution system for Bengali, which has completely different orthography and characteristics

    Annotating a broad range of anaphoric phenomena, in a variety of genres: The ARRAU Corpus

    No full text
    This paper presents the second release of arrau, a multigenre corpus of anaphoric information created over 10 years to provide data for the next generation of coreference/anaphora resolution systems combining different types of linguistic and world knowledge with advanced discourse modeling supporting rich linguistic annotations. The distinguishing features of arrau include the following: treating all NPs as markables, including non-referring NPs, and annotating their (non-) referentiality status; distinguishing between several categories of non-referentiality and annotating non-anaphoric mentions; thorough annotation of markable boundaries (minimal/maximal spans, discontinuous markables); annotating a variety of mention attributes, ranging from morphosyntactic parameters to semantic category; annotating the genericity status of mentions; annotating a wide range of anaphoric relations, including bridging relations and discourse deixis; and, finally, annotating anaphoric ambiguity. The current version of the dataset contains 350K tokens and is publicly available from LDC. In this paper, we discuss in detail all the distinguishing features of the corpus, so far only partially presented in a number of conference and workshop papers, and we also discuss the development between the first release of arrau in 2008 and this second one

    Multi-metric optimization for coreference: The UniTN/IITP/Essex submission to the 2011 CONLL Shared Task

    No full text
    Because there is no generally accepted metric for measuring the performance of anaphora resolution systems, a combination of metrics was proposed to evaluate submissions to the 2011 CONLL Shared Task (Pradhan et al., 2011). We investigate therefore Multi-objective function Optimization (MOO) techniques based on Genetic Algorithms to optimize models according to multiple metrics simultaneously

    Transverse structure and energy deposition by a subTW femtosecond laser in air: from single filament to superfilament

    Get PDF
    We traced experimentally transition from a single air filament to the superfilament under action of powerful loosely focused (NA ~ 0.0021) femtosecond beam. Two regimes were exploited with multifilament formation by artificial amplitude or intrinsic amplitude/phase front modulation of the beam having 10–60 critical powers P cr. Transverse spatial structure and energy density in the filament were studied using wideband acoustic detection and beam mode imaging single shot techniques at different distances along the optical path. We showed that with intrinsic front modulation a single extremely long ionized channel is formed provided peak power P of the initial beam does not exceed 20P cr. Its volumetric energy density is ~1.5–3 times higher than in the single filament, while linear energy density is almost 10 times higher. Artificial amplitude modulation leads to formation of either a single long filament or two closely spaced filaments at the same initial conditions. Maximal volumetric energy density was the same in both cases and slightly less than without this modulation. A few closely spaced filaments are generated at higher peak powers P with volumetric and linear energy densities experiencing fast nonlinear increase with P. Highest linear energy density achieved was 600 μJ cm−1, i.e. almost 100 times higher than that of the single filament with increase in energy 10 times only. The volumetric energy density also increases by a factor of 10 to ~800 mJ cm−3 proving huge increase in intensity and electron density that is characteristic feature of the superfilamentation. These findings were supported by the numerical simulations based on the Forward Maxwell equation with resolved driver of the field that showed superfilament splitting and confirmed energy densities estimated from the experimental data

    Anaphora Resolution with the ARRAU Corpus

    Get PDF
    The ARRAU corpus is an anaphorically annotated corpus of English providing rich linguistic information about anaphora resolution. The most distinctive feature of the corpus is the annotation of a wide range of anaphoric relations, including bridging references and discourse deixis in addition to identity (coreference). Other distinctive features include treating all NPs as markables, including non-referring NPs; and the annotation of a variety of morphosyntactic and semantic mention and entity attributes, including the genericity status of the entities referred to by markables. The corpus however has not been extensively used for anaphora resolution research so far. In this paper, we discuss three datasets extracted from the ARRAU corpus to support the three subtasks of the CRAC 2018 Shared Task–identity anaphora resolution over ARRAU-style markables, bridging references resolution, and discourse deixis; the evaluation scripts assessing system performance on those datasets; and preliminary results on these three tasks that may serve as baseline for subsequent research in these phenomena

    Can machine learning aid in delivering new use cases and scenarios in 5G?

    No full text
    5G represents the next generation of communication networks and services, and will bring a new set of use cases and scenarios. These in turn will address a new set of challenges from the network and service management perspective, such as network traffic and resource management, big data management and energy efficiency. Consequently, novel techniques and strategies are required to address these challenges in a smarter way. In this paper, we present the limitations of the current network and service management and describe in detail the challenges that 5G is expected to face from a management perspective. The main contribution of this paper is presenting a set of use cases and scenarios of 5G in which machine learning can aid in addressing their management challenges. It is expected that machine learning can provide a higher and more intelligent level of monitoring and management of networks and applications, improve operational efficiencies and facilitate the requirements of the future 5G network
    corecore